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SIZING  AN  ENVIRONMENTAL  DATA  SET 


Introduction 


This  report  presents  a  basic  equation  and  three  derivative  equations  that  can  be 
used  to  size  environmental  data  sets.  Hiese  equations  were  formulated  to  document 
mass  storage  requirements  for  USAPETAC's  environmental  support  to  the  Worldwide 
Military  Command  and  Control  System  (WWMCCS).  With  minor  modifications ,  these  equa¬ 
tions  have  a  much  wider  application.  They  are  applied  to  a  specific  example  taken 
from  a  request  for  environmental  support. 

A  set  of  diagnostic  equations  which  can  be  used  in  quantitative  quality  control 
applications  are  also  Included. 

The  Basic  Equation 

Each  climatological  study  begins  with  weather  observations  taken  at  specific 
times  and  locations  and  containing  a  measure  of  meteorological  elements.  Hiese  ob¬ 
servations  are  collected  at  numerous  locations  and  times  for  varying  periods  of 
record  (POR)  and  stored  for  later  summarizing,  analysis,  or  both. 

Equation  (1)  provides  an  estimate  of  the  amount  of  storage  that  will  be  required 
for  a  particular  data  set. 


SQ  =  (P)(L)(E)(Y) 


(1) 


where  Sq  =  storage  required  for  the  record  of  observations 
P  *  number  of  meteorological  elements 
L  =*  number  of  observation  sites 
Y  -  period  of  record  (POR)  parameter 
E  =*  storage  parameter 
The  POR  parameter,  Y,  can  be  expanded  to 

*  =  (y)(h)(<i) 

when*  y  =  number  of  years  of  record 

h  *  number  of  observations  per  day 
d  **  number  of  days  per  calendar  unit 

d  =  365  (days/year)  for  most  appli cations,  but 
d  =*  31  (days/January)  for  some  applications. 
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The  storage  parameter,  E,  can  be  expressed  in  terms  dictated  by  the  storage 
medium,  e.g.,  observations  per  page  for  paper  storage  or  observations  per  microfiche. 
This  paper  will  use 


E  *  BB 


where 


W  *  bytes  per  parameter,  and 


-  bits  per  byte  (IBM  360/4*4). 
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Making  the  above  substitutions  In  Equation  (1)  * 

SQ  =  (P)(L)(8)(B)(y)(h)(365)  (la) 

which  is  the  form  that  is  useful  at  the  USAF  Environmental  Technical  Applications 
Center  (USAKETAC). 

Derived  Equations 

Climatology  requires  summarization.  Equation  (2)  applies  to  summarized  data. 


S8  -  (F)(I)(K)(A)(E)  (2) 

where  Ss  =  storage  required  for  the  record  of  summarized  observations 

F  ■  summarized  parameter  of  measurement  of  the  meteorological  element 
L  =  number  of  observation  sites  summarized 
K  *  number  of  times  (e.g.,  hours)  for  which  F  is  computed 
A  °  number  of  periods  (e.g.,  months)  for  which  P  is  computed 
E  a  as  in  Equation  (1)4 

We  can  carry  the  derivation  one  step  further.  If  we  need  the  elements  summarized 
in  combination,  e.g.,  ceiling/visibility,  wind  chill  factors  (wind/temoerature) ,  or 
by  location  (when  is  the  temperature  the  same  at  two  or  more  locations),  or  in  com¬ 
bination  of  elements  at  two  or  more  locations  (probability  of  favorable  takeoff  con¬ 
ditions  at  location  A  and  favorable  landing  conditions  at  location  B  6  hours  later  - 
a  mission  success  indicator) ,  we  can  use  Equations  (3)  and  (4)  to  estimate  the  stor¬ 
age  required. 

Equation  (3)  allows  one  to  estimate  the  storage  required  for  "any  combination  of 
elements."  Equation  (4)  is  used  for  a  combination  of  elements  at  multiple  locations. 


(c)(E)(K)(a)(e) 

(3) 

(C)(L')(h)(A)(E) 

(*> 

where  c  _  Pi  l,  _  Ei 

(P-N)IN!  (L-M)fMt 

Sc  »  storage  required  for  the  record  of  combinations  of  summarized  ob¬ 
servations 

SCL  “  storage  required  for  the  record  of  multiple -location  combinations  of 
summarized  observations 

C  =*  number  of  meteorological  elements  taken  N  at  a  time 
if  *  number  of  locations  taken  M  at  a  time 
When  N  *  1,  C  ■  P  and  when  M  *  1,  if  -  T. 

Diagnostic  Equations 

To  test  "completeness"  of  a  summarized  data  set  we  can  use  Equations  (5)  and  (6). 


Dfe  -  (C) (L) (Y) 


(5) 


f 


(i 


2 


D0  =  (C)(L)(Y) 


(6) 


where  =  number  of  observations  expected  in  a  complete  data  set 

I)o  =  number  of  observations  found  in  the  data  set 

C  =*  number  of  combinations  of  elements 

Y  =  (y)(h)(d) 

A  ratio  of  Dq  and  D^.  gives  a  measure  of  completeness  and 

(10°)  ^  -  K  (7) 


where  K  =  the  percentage  "complete" 

For  example,  in  using  the  Revised  Uniform  Summary  of  Surface  Weather  Observations 
( RUSS WO )  for  Altus  AFB,  OK  we  find  that  15,463  observations  (Db)  were  used  to  com¬ 
pute  January  all  hours  ceiling  versus  visibility  for  the  POR  1944-45  and  1954-72. 
Using  Equation  (5)  we  find  C  ■  1,  L  ■  1,  y  »  21,  h  -  24,  and  d  -  31  (days /January) . 
Then 


Db  -  (21) (24) (31)  -  15,624 

and 

(100)  ^  =  (100)  §§§1)  =  98.97%  complete 


Practical  Example 

USAPETAC  received  a  proposal  for  support  to  the  US  Arny  requesting  "all  param¬ 
eters  used  in  Army  operations  for  up  to  500  locations.  These  parameters  may  be 
needed  in  any  combination."  A  list  of  23  elements  followed.  Using  Equation  (la) 
with  P  »  23,  L  »  500,  h  =*  24  (hours/day),  y  *  10  (years),  and  B  *  1,  we  find 
S  35  (8.0592) (10*)  bits.  This  is  about  thirty  2400-foot  magnetic  tapes  written  at  a 
density  of  1600  bits  per  inch. 

To  satisfy  this  request  USAPETAC  woulu  need  to  summarize  by  month,  plus  an 
annual  value,  all  of  the  parameters  for  all  locations  and  for  all  hours.  Analytical¬ 
ly,  then,  the  assumptions  are  P  *  P  (23),  L  =  Z  (500),  h  ■  K  (24),  y  «  10,  E  -  E, 

A  =  13  (monthly  values  plus  annual).  Prom  Equation  (2)  we  find: 

Ss  «  (23)  (500)  (24)  (13)  (8)  =*  (2.8704 )  ( 107  ) 

Or,  taking  a  ratio  of  SQ/Sg: 

^  (P)(L)(h)(10)( 365) (E) 

s8  (p)(r)(h)(i3)(E) 

If  we  allow  365/13  to  be  approximated  by  30,  we  find  that 

so  -  300  (8) 

Equation  (8)  says  that  3 00  tapes  of  observations  can  be  reduced  to  one  tape  of  sum¬ 
marized  data  under  the  stated  assumptions.  This  ratio  approximates  the  current 
procedures  at  USAPETAC. 
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Moving  to  Equation  (3),  we  find  that  the  total  number  of  combinations  of  P  param¬ 
eters  taken  1,  or  2,  . . • ,  or  N  at  a  time  can  be  expressed  by 

N 

^  C,  =  2N-1 
i-1 

When  N  »  23,  Ca3  ■  8,388,607.  Table  1  shows  a  step-by-step  increase  in  the  required 
storage  as  more  and  more  combinations  are  considered*  The  TOTALS  line  in  Table  1 
show  that  300  tapes  of  "raw"  observations  of  23  elements  can  generate  364,855  tapes 
of  combinations  of  elements,  and  there  remains  300  tapes  of  observations  for  a  total 
storage  requirement  of  365 » 155  tapes. 

If  we  take  stations  three  at  a  time  from  a  population  of  500  stations,  we  find 

L  *  20,708,500.  Apply  this  to  the  total  in  Table  1  and,  using  Equation  (4),  we 

find  that  we  can  easily  generate  (6)(101*)  tapes  of  summaries  for  any  combination 
of  elements  for  any  combination  of  three  stations  from  a  population  of  500  stations. 


Table  1.  Computed  Storage  Requirements 

• 

N 

C 

se 

RATIO  rr- 
C 

TAPES  Sc  PER  300 
TAPES  SQ 

N  = 

1 

or 

22 

23 

300. 

1 

N  * 

2 

or 

21 

253 

27.22273 

11 

N  » 

3 

or 

20 

1,771 

3.89610 

77 

N  * 

4 

or 

19 

8,855 

0.78022 

385 

N  - 

5 

or 

18 

33,649 

0.20506 

1,463 

N  = 

6 

or 

17 

100,947 

0.06835 

4,390 

N  =* 

7 

or 

16 

245,157 

0.02815 

10,658 

N  =* 

8 

or 

15 

490,314 

0.01407 

21,322 

N  *= 

9 

or 

14 

817,190 

0.00844 

35,5^5 

N  = 

10 

or 

13 

1,144,066 

0.00603 

49,751 

N  - 

11 

or 

12 

1,352,078 

O.OOSIO 

58,824 

N  - 

23 

1 

6,900. 

1 

23 

TOTALS 

Y  cl  -  8,388,607 

364,855 

1-1 

Note 

USAKETAC  has,  at  Scott  APB,  IL  and  Asheville,  NC,  about  25.000  magnetic  tapes 
dedicated  to  storing  obseryat ions .  These  data  Include  about  9^00  stations  reporting 
surface  data  and  2000  stations  reporting  upper-air  data  for  which  '‘suitable”  periods 
of  records  exist.  An  additional  20,000  magnetic  tapes  contain  summaries  and  other 
derived  information. 


